Woefzela - An Open-Source Platform for ASR Data Collection in the Developing World
نویسندگان
چکیده
Building transcribed speech corpora for under-resourced languages plays a pivotal role in developing speech technologies for such languages. We have developed an open-source tool for devices running the Android operating system to facilitate the efficient collection of speech data for Automatic Speech Recognition system development. The tool was designed for use in typical developing-world conditions; we present the relevant design choices and analyse the effectiveness of this tool by means of a case study. In particular, we introduce a novel semi-real-time quality monitoring system, which increases the efficiency of the data collection process.
منابع مشابه
A smartphone-based ASR data collection tool for under-resourced languages
Acoustic data collection for automatic speech recognition (ASR) purposes is a particularly challenging task when working with underresourced languages, many of which are found in the developing world. We provide a brief overview of related data collection strategies, highlighting some of the salient issues pertaining to collecting ASR data for under-resourced languages. We then describe the dev...
متن کاملDeveloping a Model of Sustainable Innovation in Public Hospitals Using Grounded Theory
Introduction: Today, sustainable innovation is recognized as a way to gain a competitive advantage and solve social and environmental problems. The aim of this study was to developing a model of sustainable innovation in public hospitals in Tehran using Grounded Theory in public hospitals in Tehran. Methods: The present study is a qualitative study that was performed by Grounded Theory method...
متن کاملNeurOSS - Open Source Software for Neuropsychological Rehabilitation
In recent years hundreds of successful community-driven open source software projects have incarnated. However, it is quite hard to find similar success stories in the field of neuropsychological rehabilitation. This paper describes the core ideas of the NeurOSS project. The project aims at building an open source software platform for developing tools for neuropsychological rehabilitation, and...
متن کاملOpen-source mobile digital platform for clinical trial data collection in low-resource settings
BACKGROUND Governments, universities and pan-African research networks are building durable infrastructure and capabilities for biomedical research in Africa. This offers the opportunity to adopt from the outset innovative approaches and technologies that would be challenging to retrofit into fully established research infrastructures such as those regularly found in high-income countries. In t...
متن کاملQuality measurements for mobile data collection in the developing world
The collection of speech data suitable for speech technology development is a challenge for under-resourced languages. Factors such as cost, availability of mother-tongue speakers and vast geographic distances call for techniques to optimise the data collection process in order to reduce re-collection of data. The use of mobile devices facilitate remote speech data collection. Although mobile (...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011